Search CORE

85 research outputs found

Duration modeling with expanded HMM applied to speech recognition

Author: Bonafonte Cávez Antonio
Nogueiras Rodríguez Albino
Vidal Manzano José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

The occupancy of the HMM states is modeled by means of a Markov chain. A linear estimator is introduced to compute the probabilities of the Markov chain. The distribution function (DF) represents accurately the observed data. Representing the DF as a Markov chain allows the use of standard HMM recognizers. The increase of complexity is negligible in training and strongly limited during recognition. Experiments performed on acoustic-phonetic decoding shows how the phone recognition rate increases from 60.6 to 61.1. Furthermore, on a task of database inquires, where phones are used as subword units, the correct word rate increases from 88.2 to 88.4.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Ponderación ML de parámetros en un sistema de reconocimiento de palabras basado en CDHMM

Author: Hernando Pericás Francisco Javier
Nogueiras Rodríguez Albino
Valverde Amador Antonio Javier
Publication venue
Publication date: 01/01/1996
Field of study

Speech dynamic feature are routinely used in current speech recognition systems in combination with short-term (static) spectral features. The aim of this paper is to propose a method to automatically estimate the optimum ponderation of static and dynamic features in a speech recognition system. The recognition system considered in this paper is based on Continuous-Density Hidden Markov Modelling (CDHMM) widely used in speech recognition. Our approach consists basically in 1) adding two new parameters for each state of each model that weight both kinds of speech features, and 2) estimating those parameters by means of a Maximum Likelihood training. Experimental results in speaker independent digit recognition show an important increase of recognition accuracy.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Multidialectal acoustic modeling: a comparative study

Author: Caballero Galeote Mónica
Moreno Bilbao M. Asunción
Nogueiras Rodríguez Albino
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, multidialectal acoustic modeling based on shar- ing data across dialects is addressed. A comparative study of different methods of combining data based on decision tree clustering algorithms is presented. Approaches evolved differ in the way of evaluating the similarity of sounds between di- alects, and the decision tree structure applied. Proposed systems are tested with Spanish dialects across Spain and Latin Amer- ica. All multidialectal proposed systems improve monodialectal performance using data from another dialect but it is shown that the way to share data is critical. The best combination between similarity measure and tree structure achieves an improvement of 7% over the results obtained with monodialectal systems.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

An adaptive gradient-search based algorithm for discriminative training of hmm's

Author: Mariño Acebal José Bernardo
Monte Moreno Enrique
Nogueiras Rodríguez Albino
Publication venue: Robert H. Mannel and Jordi Robert-Ribes
Publication date: 01/01/1998
Field of study

Although having revealed to be a very powerful tool in acoustic modelling, discriminative training presents a major drawback: the lack of a formulation guaranteeing convergence in no matter which initial conditions, such as the Baum-Welch algorithm in maximum likelihood training. For this reason, a gradient descent search is usually used in this kind of problem. Unfortunately, standard gradient descent algorithms rely heavily on the election of the learning rates. This dependence is specially cumbersome because it represents that, at each run of the discriminative training procedure, a search should be carried out over the parameters ruling the algorithm. In this paper we describe an adaptive procedure for determining the optimal value of the step size at each iteration. While the calculus and memory overhead of the algorithm is negligible, results show less dependence on the initial learning rate than standard gradient descent and, using the same idea in order to apply self-scaling, it clearly outperforms it.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

First experiments on an HMM based double layer framework for automatic continuous speech recognition

Author: Caballero Galeote Mónica
Casar López Marta
Nogueiras Rodríguez Albino
Rodríguez Fonollosa José Adrián
Publication venue
Publication date: 01/01/2006
Field of study

The usual approach to automatic continuous speech recognition is what can be called the acoustic-phonetic modelling approach. In this approach, voice is considered to hold two different kinds of information acoustic and phonetic . Acoustic information is represented by some kind of feature extraction out of the voice signal, and phonetic information is extracted from the vocabulary of the task by means of a lexicon or some other procedure. The main assumption in this approach is that models can be constructed that capture the correlation existing between both kinds of information. The main limitation of acoustic-phonetic modelling in speech recognition is its poor treatment of the variability present both in the phonetic level and the acoustic one. In this paper, we propose the use of a slightly modified framework where the usual acoustic-phonetic modelling is divided into two different layers: one closer to the voice signal, and the other closer to the phonetics of the sentence. By doing so we expect an improvement of the modelling accuracy, as well as a better management of acoustic and phonetic variability. Experiments carried out so far, using a very simpli ed version of the proposed framework, show a signi cant improvement in the recognition of a large vocabulary continuous speech task, and represent a promising start point for future research.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC